Pesquisa | Portal Regional da BVS

1.

MetaTron: advancing biomedical annotation empowering relation annotation and collaboration.

Irrera, Ornella; Marchesin, Stefano; Silvello, Gianmaria.

BMC Bioinformatics ; 25(1): 112, 2024 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-38486137

RESUMO

BACKGROUND: The constant growth of biomedical data is accompanied by the need for new methodologies to effectively and efficiently extract machine-readable knowledge for training and testing purposes. A crucial aspect in this regard is creating large, often manually or semi-manually, annotated corpora vital for developing effective and efficient methods for tasks like relation extraction, topic recognition, and entity linking. However, manual annotation is expensive and time-consuming especially if not assisted by interactive, intuitive, and collaborative computer-aided tools. To support healthcare experts in the annotation process and foster annotated corpora creation, we present MetaTron. MetaTron is an open-source and free-to-use web-based annotation tool to annotate biomedical data interactively and collaboratively; it supports both mention-level and document-level annotations also integrating automatic built-in predictions. Moreover, MetaTron enables relation annotation with the support of ontologies, functionalities often overlooked by off-the-shelf annotation tools. RESULTS: We conducted a qualitative analysis to compare MetaTron with a set of manual annotation tools including TeamTat, INCEpTION, LightTag, MedTAG, and brat, on three sets of criteria: technical, data, and functional. A quantitative evaluation allowed us to assess MetaTron performances in terms of time and number of clicks to annotate a set of documents. The results indicated that MetaTron fulfills almost all the selected criteria and achieves the best performances. CONCLUSIONS: MetaTron stands out as one of the few annotation tools targeting the biomedical domain supporting the annotation of relations, and fully customizable with documents in several formats-PDF included, as well as abstracts retrieved from PubMed, Semantic Scholar, and OpenAIRE. To meet any user need, we released MetaTron both as an online instance and as a Docker image locally deployable.

Assuntos

Poder Psicológico , Semântica , PubMed

2.

Modelling digital health data: The ExaMode ontology for computational pathology.

Menotti, Laura; Silvello, Gianmaria; Atzori, Manfredo; Boytcheva, Svetla; Ciompi, Francesco; Di Nunzio, Giorgio Maria; Fraggetta, Filippo; Giachelle, Fabio; Irrera, Ornella; Marchesin, Stefano; Marini, Niccolò; Müller, Henning; Primov, Todor.

J Pathol Inform ; 14: 100332, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37705689

RESUMO

Computational pathology can significantly benefit from ontologies to standardize the employed nomenclature and help with knowledge extraction processes for high-quality annotated image datasets. The end goal is to reach a shared model for digital pathology to overcome data variability and integration problems. Indeed, data annotation in such a specific domain is still an unsolved challenge and datasets cannot be steadily reused in diverse contexts due to heterogeneity issues of the adopted labels, multilingualism, and different clinical practices. Material and methods: This paper presents the ExaMode ontology, modeling the histopathology process by considering 3 key cancer diseases (colon, cervical, and lung tumors) and celiac disease. The ExaMode ontology has been designed bottom-up in an iterative fashion with continuous feedback and validation from pathologists and clinicians. The ontology is organized into 5 semantic areas that defines an ontological template to model any disease of interest in histopathology. Results: The ExaMode ontology is currently being used as a common semantic layer in: (i) an entity linking tool for the automatic annotation of medical records; (ii) a web-based collaborative annotation tool for histopathology text reports; and (iii) a software platform for building holistic solutions integrating multimodal histopathology data. Discussion: The ontology ExaMode is a key means to store data in a graph database according to the RDF data model. The creation of an RDF dataset can help develop more accurate algorithms for image analysis, especially in the field of digital pathology. This approach allows for seamless data integration and a unified query access point, from which we can extract relevant clinical insights about the considered diseases using SPARQL queries.

3.

Building a large gene expression-cancer knowledge base with limited human annotations.

Marchesin, Stefano; Menotti, Laura; Giachelle, Fabio; Silvello, Gianmaria; Alonso, Omar.

Database (Oxford) ; 20232023 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-37768281

RESUMO

Cancer prevention is one of the most pressing challenges that public health needs to face. In this regard, data-driven research is central to assist medical solutions targeting cancer. To fully harness the power of data-driven research, it is imperative to have well-organized machine-readable facts into a knowledge base (KB). Motivated by this urgent need, we introduce the Collaborative Oriented Relation Extraction (CORE) system for building KBs with limited manual annotations. CORE is based on the combination of distant supervision and active learning paradigms and offers a seamless, transparent, modular architecture equipped for large-scale processing. We focus on precision medicine and build the largest KB on 'fine-grained' gene expression-cancer associations-a key to complement and validate experimental data for cancer research. We show the robustness of CORE and discuss the usefulness of the provided KB. Database URL https://zenodo.org/record/7577127.

Assuntos

Neoplasias , Humanos , Neoplasias/genética , Bases de Dados Factuais , Bases de Conhecimento , Medicina de Precisão , Expressão Gênica

4.

Empowering digital pathology applications through explainable knowledge extraction tools.

Marchesin, Stefano; Giachelle, Fabio; Marini, Niccolò; Atzori, Manfredo; Boytcheva, Svetla; Buttafuoco, Genziana; Ciompi, Francesco; Di Nunzio, Giorgio Maria; Fraggetta, Filippo; Irrera, Ornella; Müller, Henning; Primov, Todor; Vatrano, Simona; Silvello, Gianmaria.

J Pathol Inform ; 13: 100139, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36268087

RESUMO

Exa-scale volumes of medical data have been produced for decades. In most cases, the diagnosis is reported in free text, encoding medical knowledge that is still largely unexploited. In order to allow decoding medical knowledge included in reports, we propose an unsupervised knowledge extraction system combining a rule-based expert system with pre-trained Machine Learning (ML) models, namely the Semantic Knowledge Extractor Tool (SKET). Combining rule-based techniques and pre-trained ML models provides high accuracy results for knowledge extraction. This work demonstrates the viability of unsupervised Natural Language Processing (NLP) techniques to extract critical information from cancer reports, opening opportunities such as data mining for knowledge extraction purposes, precision medicine applications, structured report creation, and multimodal learning. SKET is a practical and unsupervised approach to extracting knowledge from pathology reports, which opens up unprecedented opportunities to exploit textual and multimodal medical information in clinical practice. We also propose SKET eXplained (SKET X), a web-based system providing visual explanations about the algorithmic decisions taken by SKET. SKET X is designed/developed to support pathologists and domain experts in understanding SKET predictions, possibly driving further improvements to the system.

5.

Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations.

Marini, Niccolò; Marchesin, Stefano; Otálora, Sebastian; Wodzinski, Marek; Caputo, Alessandro; van Rijthoven, Mart; Aswolinskiy, Witali; Bokhorst, John-Melle; Podareanu, Damian; Petters, Edyta; Boytcheva, Svetla; Buttafuoco, Genziana; Vatrano, Simona; Fraggetta, Filippo; van der Laak, Jeroen; Agosti, Maristella; Ciompi, Francesco; Silvello, Gianmaria; Muller, Henning; Atzori, Manfredo.

NPJ Digit Med ; 5(1): 102, 2022 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-35869179

RESUMO

The digitalization of clinical workflows and the increasing performance of deep learning algorithms are paving the way towards new methods for tackling cancer diagnosis. However, the availability of medical specialists to annotate digitized images and free-text diagnostic reports does not scale with the need for large datasets required to train robust computer-aided diagnosis methods that can target the high variability of clinical cases and data produced. This work proposes and evaluates an approach to eliminate the need for manual annotations to train computer-aided diagnosis tools in digital pathology. The approach includes two components, to automatically extract semantically meaningful concepts from diagnostic reports and use them as weak labels to train convolutional neural networks (CNNs) for histopathology diagnosis. The approach is trained (through 10-fold cross-validation) on 3'769 clinical images and reports, provided by two hospitals and tested on over 11'000 images from private and publicly available datasets. The CNN, trained with automatically generated labels, is compared with the same architecture trained with manual labels. Results show that combining text analysis and end-to-end deep neural networks allows building computer-aided diagnosis tools that reach solid performance (micro-accuracy = 0.908 at image-level) based only on existing clinical data without the need for manual annotations.

6.

TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction.

Marchesin, Stefano; Silvello, Gianmaria.

BMC Bioinformatics ; 23(1): 111, 2022 Mar 31.

Artigo em Inglês | MEDLINE | ID: mdl-35361129

RESUMO

BACKGROUND: Databases are fundamental to advance biomedical science. However, most of them are populated and updated with a great deal of human effort. Biomedical Relation Extraction (BioRE) aims to shift this burden to machines. Among its different applications, the discovery of Gene-Disease Associations (GDAs) is one of BioRE most relevant tasks. Nevertheless, few resources have been developed to train models for GDA extraction. Besides, these resources are all limited in size-preventing models from scaling effectively to large amounts of data. RESULTS: To overcome this limitation, we have exploited the DisGeNET database to build a large-scale, semi-automatically annotated dataset for GDA extraction. DisGeNET stores one of the largest available collections of genes and variants involved in human diseases. Relying on DisGeNET, we developed TBGA: a GDA extraction dataset generated from more than 700K publications that consists of over 200K instances and 100K gene-disease pairs. Each instance consists of the sentence from which the GDA was extracted, the corresponding GDA, and the information about the gene-disease pair. CONCLUSIONS: TBGA is amongst the largest datasets for GDA extraction. We have evaluated state-of-the-art models for GDA extraction on TBGA, showing that it is a challenging and well-suited dataset for the task. We made the dataset publicly available to foster the development of state-of-the-art BioRE models for GDA extraction.

Assuntos

Mineração de Dados , Projetos de Pesquisa , Bases de Dados Factuais , Humanos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA